In recent years, applying deep learning (DL) to assess structural damages has gained growing popularity in vision-based structural health monitoring (SHM). However, both data deficiency and class-imbalance hinder the wide adoption of DL in practical applications of SHM. Common mitigation strategies include transfer learning, over-sampling, and under-sampling, yet these ad-hoc methods only provide limited performance boost that varies from one case to another. In this work, we introduce one variant of the Generative Adversarial Network (GAN), named the balanced semi-supervised GAN (BSS-GAN). It adopts the semi-supervised learning concept and applies balanced-batch sampling in training to resolve low-data and imbalanced-class problems. A series of computer experiments on concrete cracking and spalling classification were conducted under the low-data imbalanced-class regime with limited computing power. The results show that the BSS-GAN is able to achieve better damage detection in terms of recall and $F_\beta$ score than other conventional methods, indicating its state-of-the-art performance.
translated by 谷歌翻译
这项工作提出了一种新的计算框架,用于学习用于真实数据集的明确生成模型。特别地,我们建议在包含多个独立的多维线性子空间组成的特征空间中的多类多维数据分发和{线性判别表示(LDR)}之间学习{\ EM闭环转录}。特别地,我们认为寻求的最佳编码和解码映射可以被配制为编码器和解码器之间的{\ em二手最小游戏的均衡点}。该游戏的自然实用功能是所谓的{\ em速率减少},这是一个简单的信息定理措施,用于特征空间中子空间类似的高斯的混合物之间的距离。我们的配方利用来自控制系统的闭环误差反馈的灵感,避免昂贵的评估和最小化数据空间或特征空间的任意分布之间的近似距离。在很大程度上,这种新的制定统一了自动编码和GaN的概念和益处,并自然将它们扩展到学习多级和多维实际数据的判别和生成}表示的设置。我们对许多基准图像数据集的广泛实验表明了这种新的闭环配方的巨大潜力:在公平的比较下,学习的解码器的视觉质量和编码器的分类性能是竞争力的,并且通常比基于GaN,VAE或基于GaN,VAE或基于GaN,VAE的方法更好的方法两者的组合。我们注意到所以,不同类别的特征在特征空间中明确地映射到大约{em独立的主管子空间};每个类中的不同视觉属性由每个子空间中的{\ em独立主体组件}建模。
translated by 谷歌翻译
Reinforcement learning (RL) problems can be challenging without well-shaped rewards. Prior work on provably efficient RL methods generally proposes to address this issue with dedicated exploration strategies. However, another way to tackle this challenge is to reformulate it as a multi-task RL problem, where the task space contains not only the challenging task of interest but also easier tasks that implicitly function as a curriculum. Such a reformulation opens up the possibility of running existing multi-task RL methods as a more efficient alternative to solving a single challenging task from scratch. In this work, we provide a theoretical framework that reformulates a single-task RL problem as a multi-task RL problem defined by a curriculum. Under mild regularity conditions on the curriculum, we show that sequentially solving each task in the multi-task RL problem is more computationally efficient than solving the original single-task problem, without any explicit exploration bonuses or other exploration strategies. We also show that our theoretical insights can be translated into an effective practical learning algorithm that can accelerate curriculum learning on simulated robotic tasks.
translated by 谷歌翻译
In recent years, large amounts of effort have been put into pushing forward the real-world application of dynamic digital human (DDH). However, most current quality assessment research focuses on evaluating static 3D models and usually ignores motion distortions. Therefore, in this paper, we construct a large-scale dynamic digital human quality assessment (DDH-QA) database with diverse motion content as well as multiple distortions to comprehensively study the perceptual quality of DDHs. Both model-based distortion (noise, compression) and motion-based distortion (binding error, motion unnaturalness) are taken into consideration. Ten types of common motion are employed to drive the DDHs and a total of 800 DDHs are generated in the end. Afterward, we render the video sequences of the distorted DDHs as the evaluation media and carry out a well-controlled subjective experiment. Then a benchmark experiment is conducted with the state-of-the-art video quality assessment (VQA) methods and the experimental results show that existing VQA methods are limited in assessing the perceptual loss of DDHs. The database will be made publicly available to facilitate future research.
translated by 谷歌翻译
Estimating the 6D pose of objects is one of the major fields in 3D computer vision. Since the promising outcomes from instance-level pose estimation, the research trends are heading towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB+P and Depth, 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large scale scenes with extensive viewpoint coverage, 5) Checkerboard-free environment throughout the entire scene. We also provide benchmark results of state-of-the-art category-level pose estimation networks.
translated by 谷歌翻译
Data-Free Class Incremental Learning (DFCIL) aims to sequentially learn tasks with access only to data from the current one. DFCIL is of interest because it mitigates concerns about privacy and long-term storage of data, while at the same time alleviating the problem of catastrophic forgetting in incremental learning. In this work, we introduce robust saliency guidance for DFCIL and propose a new framework, which we call RObust Saliency Supervision (ROSS), for mitigating the negative effect of saliency drift. Firstly, we use a teacher-student architecture leveraging low-level tasks to supervise the model with global saliency. We also apply boundary-guided saliency to protect it from drifting across object boundaries at intermediate layers. Finally, we introduce a module for injecting and recovering saliency noise to increase robustness of saliency preservation. Our experiments demonstrate that our method can retain better saliency maps across tasks and achieve state-of-the-art results on the CIFAR-100, Tiny-ImageNet and ImageNet-Subset DFCIL benchmarks. Code will be made publicly available.
translated by 谷歌翻译
Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. We extensively evaluate the resulting model, which we call FlexiViT, on a wide range of tasks, including classification, image-text retrieval, open-world detection, panoptic segmentation, and semantic segmentation, concluding that it usually matches, and sometimes outperforms, standard ViT models trained at a single patch size in an otherwise identical setup. Hence, FlexiViT training is a simple drop-in improvement for ViT that makes it easy to add compute-adaptive capabilities to most models relying on a ViT backbone architecture. Code and pre-trained models are available at https://github.com/google-research/big_vision
translated by 谷歌翻译
Semantic segmentation usually benefits from global contexts, fine localisation information, multi-scale features, etc. To advance Transformer-based segmenters with these aspects, we present a simple yet powerful semantic segmentation architecture, termed as IncepFormer. IncepFormer has two critical contributions as following. First, it introduces a novel pyramid structured Transformer encoder which harvests global context and fine localisation features simultaneously. These features are concatenated and fed into a convolution layer for final per-pixel prediction. Second, IncepFormer integrates an Inception-like architecture with depth-wise convolutions, and a light-weight feed-forward module in each self-attention layer, efficiently obtaining rich local multi-scale object features. Extensive experiments on five benchmarks show that our IncepFormer is superior to state-of-the-art methods in both accuracy and speed, e.g., 1) our IncepFormer-S achieves 47.7% mIoU on ADE20K which outperforms the existing best method by 1% while only costs half parameters and fewer FLOPs. 2) Our IncepFormer-B finally achieves 82.0% mIoU on Cityscapes dataset with 39.6M parameters. Code is available:github.com/shendu0321/IncepFormer.
translated by 谷歌翻译
A common scenario of Multilingual Neural Machine Translation (MNMT) is that each translation task arrives in a sequential manner, and the training data of previous tasks is unavailable. In this scenario, the current methods suffer heavily from catastrophic forgetting (CF). To alleviate the CF, we investigate knowledge distillation based life-long learning methods. Specifically, in one-tomany scenario, we propose a multilingual distillation method to make the new model (student) jointly learn multilingual output from old model (teacher) and new task. In many-to one scenario, we find that direct distillation faces the extreme partial distillation problem, and we propose two different methods to address it: pseudo input distillation and reverse teacher distillation. The experimental results on twelve translation tasks show that the proposed methods can better consolidate the previous knowledge and sharply alleviate the CF.
translated by 谷歌翻译
Efficient ObjectGoal navigation (ObjectNav) in novel environments requires an understanding of the spatial and semantic regularities in environment layouts. In this work, we present a straightforward method for learning these regularities by predicting the locations of unobserved objects from incomplete semantic maps. Our method differs from previous prediction-based navigation methods, such as frontier potential prediction or egocentric map completion, by directly predicting unseen targets while leveraging the global context from all previously explored areas. Our prediction model is lightweight and can be trained in a supervised manner using a relatively small amount of passively collected data. Once trained, the model can be incorporated into a modular pipeline for ObjectNav without the need for any reinforcement learning. We validate the effectiveness of our method on the HM3D and MP3D ObjectNav datasets. We find that it achieves the state-of-the-art on both datasets, despite not using any additional data for training.
translated by 谷歌翻译